Semi-automatic Endogenous Enrichment of Collaboratively Constructed Lexical Resources: Piggybacking onto Wiktionary

نویسندگان

  • Franck Sajous
  • Emmanuel Navarro
  • Bruno Gaume
  • Laurent Prévot
  • Yannick Chudy
چکیده

The lack of large-scale, freely available and durable lexical resources, and the consequences for NLP, is widely acknowledged but the attempts to cope with usual bottlenecks preventing their development often result in dead-ends. This article introduces a language-independent, semi-automatic and endogenous method for enriching lexical resources, based on collaborative editing and random walks through existing lexical relationships, and shows how this approach enables us to overcome recurrent impediments. It compares the impact of using different data sources and similarity measures on the task of improving synonymy networks. Finally, it defines an architecture for applying the presented method to Wiktionary and explains how it has been implemented.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Semi-automatic enrichment of crowdsourced synonymy networks: the WISIGOTH system applied to Wiktionary

Semantic lexical resources are a mainstay of various Natural Language Processing applications. However, comprehensive and reliable resources are rare and not often freely available. Handcrafted resources are too costly for being a general solution while automatically-built resources need to be validated by experts or at least thoroughly evaluated. We propose in this paper a picture of the curre...

متن کامل

UBY - A Large-Scale Unified Lexical-Semantic Resource Based on LMF

We present UBY, a large-scale lexicalsemantic resource combining a wide range of information from expert-constructed and collaboratively constructed resources for English and German. It currently contains nine resources in two languages: English WordNet, Wiktionary, Wikipedia, FrameNet and VerbNet, German Wikipedia, Wiktionary and GermaNet, and multilingual OmegaWiki modeled according to the LM...

متن کامل

Accessing Multilingual Data on the Web for the Semantic Annotation of Cultural Heritage Texts

Our study targets interoperable semantic annotation of Cultural Heritage or eHumanities texts in German and Hungarian. A semantic resource we focus on is the Thompson Motif-index of folk-literature (TMI), the labels of which are available only in English. We investigate the use lexical data on the Web in German and Hungarian for supporting semi-automatic translation of TMI: lexical resources of...

متن کامل

UBY-LMF - A Uniform Model for Standardizing Heterogeneous Lexical-Semantic Resources in ISO-LMF

We present UBY-LMF, an LMF-based model for large-scale, heterogeneous multilingual lexical-semantic resources (LSRs). UBY-LMF allows the standardization of LSRs down to a fine-grained level of lexical information by employing a large number of Data Categories from ISOCat. We evaluate UBY-LMF by converting nine LSRs in two languages to the corresponding format: the English WordNet, Wiktionary, W...

متن کامل

Wiktionary: A new rival for expert-built lexicons? Exploring the possibilities of collaborative lexicography

With the rise of the Web 2.0, collaboratively constructed language resources are rivalling expert-built lexicons. The collaborative construction process of these resources is driven by what is called the “Wisdom of Crowds” phenomenon, which offers very promising research opportunities in the context of electronic lexicography. The vast number and broad diversity of authors yield, for instance, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010